Ground truth bias in external cluster validity indices
نویسندگان
چکیده
منابع مشابه
Ground truth bias in external cluster validity indices
External cluster validity indices (CVIs) are used to quantify the quality of a clustering by comparing the similarity between the clustering and a ground truth partition. However, some external CVIs show a biased behaviour when selecting the most similar clustering. Users may consequently be misguided by such results. Recognizing and understanding the bias behaviour of CVIs is therefore crucial...
متن کاملSelection Bias, Label Bias, and Bias in Ground Truth
Language technology is biased toward English newswire. In POS tagging, we get 97–98 words right out of a 100 in English newswire, but results drop to about 8 out of 10 when running the same technology on Twitter data. In dependency parsing, we are able to identify the syntactic head of 9 out of 10 words in English newswire, but only 6–7 out of 10 in tweets. Replace references to Twitter with re...
متن کاملOnline Cluster Validity Indices for Streaming Data
Cluster analysis is used to explore structure in unlabeled data sets in a wide range of applications. An important part of cluster analysis is validating the quality of computationally obtained clusters. A large number of different internal indices have been developed for validation in the offline setting. However, this concept has not been extended to the online setting. A key challenge is to ...
متن کاملImproving Cluster Method Quality by Validity Indices
Clustering attempts to discover significant groups present in a data set. It is an unsupervised process. It is difficult to define when a clustering result is acceptable. Thus, several clustering validity indices are developed to evaluate the quality of clustering algorithms results. In this paper, we propose to improve the quality of a clustering algorithm called ”CLUSTER” by using a validity ...
متن کاملAn Information-Theoretic External Cluster-Validity Measure
In this paper we propose a measure of sim ilarity /association between two partitions of a set of objects. Our motivation is the desire to use the measure to characterize the quality or accuracy of clustering algorithms by some how comparing the clusters they produce with "ground truth" consisting of classes as signed by manual means or some other means in whose veracity there is confidence....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pattern Recognition
سال: 2017
ISSN: 0031-3203
DOI: 10.1016/j.patcog.2016.12.003